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Abstract 

Background: The Amoebozoa constitute one of the primary divisions of eukaryotes, encompassing taxa of both 
biomedical and evolutionary importance, yet its genomic diversity remains largely unsampled. Here we present an 
analysis of a whole genome assembly of Acanthamoeba castellanii {Ac) the first representative from a solitary free- 
living amoebozoan. 

Results: Ac encodes 15,455 compact intron-rich genes, a significant number of which are predicted to have arisen 
through inter-kingdom lateral gene transfer (LGT). A majority of the LGT candidates have undergone a substantial 
degree of intronization and Ac appears to have incorporated them into established transcriptional programs. Ac 
manifests a complex signaling and cell communication repertoire, including a complete tyrosine kinase signaling 
toolkit and a comparable diversity of predicted extracellular receptors to that found in the facultatively multicellular 
dictyostelids. An important environmental host of a diverse range of bacteria and viruses, Ac utilizes a diverse 
repertoire of predicted pattern recognition receptors, many with predicted orthologous functions in the innate 
immune systems of higher organisms. 

Conclusions: Our analysis highlights the important role of LGT in the biology of Ac and in the diversification of 
microbial eukaryotes. The early evolution of a key signaling facility implicated in the evolution of metazoan 
multicellularity strongly argues for its emergence early in the Unikont lineage. Overall, the availability of an Ac 
genome should aid in deciphering the biology of the Amoebozoa and facilitate functional genomic studies in this 
important model organism and environmental host. 



Background 

Acanthamoeba castellanii {Ac) is one of the predominant 
soil organisms in terms of population size and distribu- 
tion, where it acts both as a predator and an environmen- 
tal reservoir for a number of bacterial, fungal and viral 
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species [1]. Selective grazing hy Ac in the rhizosphere 
alters microbial community structure and is an important 
contributor to the development of root architecture and 
nutrient uptake by plants [2]. Ac can also be isolated 
from almost any body of water and manifests in a wide 
variety of man-made water systems, including potable 
water sources, swimming pools, hot tubs, showers and 
hospital air conditioning units [3,4]. Acanthamoebae are 
frequently associated with a diverse range of bacterial 
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symbionts [5,6]. A subset of the microbes that serve as 
prey for Ac have evolved virulence stratagems to use Ac 
as both a replicative niche and as a vector for dispersal 
and are important human intracellular pathogens [7,8]. 
These pathogens utilize analogous strategies to infect and 
persist within mammalian macrophages, illustrating the 
role of environmental hosts such as Ac in the evolution 
and maintenance of virulence [9,10]. Commonalities at 
the level of host response between amoebae and macro- 
phages to such pathogens have led to the use of both Dic- 
tyostelium discoideum (Dd) and Ac as model systems to 
study pathogenesis [11,12]. 

Published Amoebozoa genomes from both the obligate 
parasite Entamoeba histolytica {Eh) and the facultatively 
multicellular Dd have both highlighted unexpected com- 
plexities at the level of cell motility and signaling 
[13,14]. As the only solitary free-living representative, 
the genome of Ac establishes a unique reference point 
for comparisons for the interpretation of other amoe- 
bozoan genomes. Experimentally, Ac has been a more 
thoroughly studied organism than most other free living 
amoebae, acting as a model organism for studies on the 
cytoskeleton, cell movement, and aspects of gene regula- 
tion, with a large body of literature supporting its mole- 
cular interactions [15-18]. 

Results and discussion 

Lateral gene transfer 

Lateral gene transfer (LGT) is considered a key process 
of genome evolution and several studies have indicated 
that phagotrophs manifest an increased rate of LGT 
compared to non-phagotrophic organisms [19]. As a 
geographically dispersed bacteriovorous amoebae with a 
penchant for harboring endosymbionts, Ac encounters a 
rich and diverse supply of foreign DNA, providing 
ample opportunity for LGT, Homology-based searches 
of the proteome illustrate the potential for diverse con- 
tributions to the genome (Figure 1). 

We therefore undertook a phylogenomic analysis to 
determine cases of predicted inter-domain LGT in the 
Ac genome (Section 2 of Additional file 1). Our analysis 
identified 450 genes, or 2.9% of the proteome, predicted 
to have arisen through LGT (Figure 2; Section 2 of 
Additional file 1). To determine the fate and ultimate 
utility of the LGT candidates within the Ac genome, we 
examined their expression levels across a number of 
experimental conditions using RNA.seq (Table SI. 6.1 in 
Additional file 1). Our results show that most of the 
LGT candidates are expressed in at least some of the 
conditions tested (Additional file 2). 
Genetic exchange is also thought to occur between phy- 
logenetically disparate organisms that reside within the 
same amoebal host cell [20,21]. Ac contains three copies 
of a miniature transposable element (ISSoc2) of the 



IS607 family of insertion sequences related to those pre- 
sent in genomes of thermophilic cyanobacteria [22] and 
several giant nucleocytoplasmic large DNA viruses 
(NCLDVs). In the Mimivirus genome the IS elements 
are found within islands of genes of bacterial origin, 
some of which appear to have been contributed by a 
cyanobacterial donor. This data underscores the com- 
plex intermediary role that Ac, as host to both NCLDVs 
and cyanobacteria [17] may play in facilitating genetic 
transfer between sympatric species. 

Comparison of predicted LGT across amoeboid genomes 

In order to compare the impact and scale of LGT across 
Ac and other amoeba, we applied the same phyloge- 
nomic approach used to identify LGT in the Ac genome 
to published genomes of other amoeboid protists, 
including Dd, Eh, Entamoeba dispar [Ed) and Naegleria 
gruberi {Ng), Our findings predict that Ac and the exca- 
vate Ng encode a notably higher number of laterally 
acquired bacterial genes than either of the more closely 
related parasitic Entamoeba or the social Dd amoebozo- 
ans (Figure 2a). The taxonomic distribution of putative 
LGT donors is broadly similar for both Entamoeba spe- 
cies, but surprisingly also between Ac and Ng (Figure 2b,c; 
Section 2 of Additional file 1). The genomes of both Eh 
and Ed are predicted to have experienced a proportio- 
nately higher influx from anaerobic and host-associated 
microbes than their free-living counterparts Ac and Ng 
(Figure 2c; Additional file 2), likely reflecting the composi- 
tion of microbes within their habitats. Many of the LGT 
candidates across all of the amoebae have predicted meta- 
bolic functions, suggesting that LGT in amoebae is reflec- 
tive of trophic strategy and driven by the selective pressure 
of new ecological niches. Our data illustrating LGT as a 
contributing factor in shaping the biology of a diversity of 
amoeboid genomes provide further evidence supporting 
an underappreciated role for LGT in the diversification of 
microbial eukaryotes [23]. 

Introns 

Intron-exon structures exhibit complex phylogenetic pat- 
terns with orders-of-magnitude differences across eukar- 
yotic lineages, which imply frequent transformations 
during eukaryotic evolution [24]. Some researchers have 
argued that intron gain is episodic with long periods of 
stasis [25] punctuated by periods of rapid gain while 
others argue for generally higher rates [26] . Strikingly, Ac 
genes have an average of 6.2 introns per gene, among the 
highest known in eukaryotes [27]. Genes predicted to have 
arisen through LGT have slightly lower but broadly com- 
parable intron densities, offering an opportunity to study 
the evidence for proposed mechanisms underpinning 
post-LGT intron gain [28]. An analysis of LGT introns, 
however, did not provide support for any of the proposed 
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Figure 1 Measures of the composition of the Ac genome based on sequence similarity. For each protein, the best BLASTP hit to the non- 
redundant database, that is, the match with the lowest e-value, was recovered and the classification of the corresponding organism was 
extracted according to NCBI taxonomy. The central bar represents the full complement of annotated Ac genes exhibiting a best BLASTP hit 
respectively against the four kingdoms - Eukaryota (blue). Bacteria (red), Archaea (green) and viruses (purple) - with orphan genes depicted in 
yellow. Results for Eukaryota are subdivided according to the major taxonomic phyla in varying shades of blue. Subdivisions of phyla within the 
Bacteria (red shading), Archaea (green shading) and viruses (purple shading) are depicted in the expanded upper and lower sidebars. dsDNA, 
double-stranded DNA. 
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mechanisms of intron gain (Section 2 of Additional file 1). 
Thus, while the preponderance of introns in LGTs clearly 
indicates substantial intron gain at some point, it appears 
that, for Ac, these events have been very rare in recent 
times, consistent with a punctate model of intron gain. 

Cell signaling 

As a unicellular sister grouping to the multicellular Die- 
tyostelids, Ac provides a unique point of comparison to 
gain insight into the molecular underpinnings of multicel- 
lular development in Amoebozoa. Cell-cell communication 
is a hallmark of multicellularity and we looked at putative 
receptors for extracellular signals and their downstream 
targets. G-protein-coupled receptors (GPCRs) represent 
one of the largest families of sensors for extracellular sti- 
muli. Overall, Ac encodes 35 GPCRs (compared to 61 in 
Dd), representing 4 out of the 6 major families of GPCRs 
[29] while lacking metabotropic glutamate-like GPCRs or 
fungal pheromone receptors. We identified three predicted 
fungal-associated glucose-sensing Git3 GPCRs [30] and an 
expansion in the number of frizzled/smoothened receptors 
[31] (Figure S3. 1.1 in Additional file 1). We identified seven 
G-protein alpha subunits and a single putative target, phos- 
pholipase C, for GPCR-mediated signaling. The number 
and diversity of receptors in Ac raises the question of what 
they are likely to be sensing. Nematodes employ many of 
their GPCRs in detecting molecules secreted by their bac- 
terial food sources [32], and given the diversity of Ac's 



feeding environments, many of the Ac GPCRs may fulfill a 
similar role. 

Environmental sensing 

We identified 48 sensor histidine kinases (SHKs), of which 
17 harbor transmembrane domains and may function as 
receptors (Figure S3.2.1 in Additional file 1). Remarkably, 
there are also 67 nucleotidyl cyclases consisting of an 
extracellular receptor domain separated by a single trans- 
membrane helix from an intracellular cyclase domain 
flanked by two serine/threonine kinase domains. This 
domain configuration is present in a number of the 
amoeba-infecting giant viruses but thus far appears unique 
for a cellular organism (Figure S3.3.1 in Additional file 1). 
Ac is able to survive under microaerophilic conditions 
such as those found in the deeper layers of underwater 
sediments or within the rhizosphere. The genome encodes 
a number of prolyl 4-hydroxylases that likely mediate oxy- 
gen response; however, Ac also contains a number of 
heme-nitric oxide/oxygen binding (H-NOX) proteins that, 
unlike those in other eukaryotes, are not found in con- 
junction with guanylyl cyclases [33]. The Ac H-NOX pro- 
teins lack a critical tyrosine residue in the non-polar distal 
heme pocket, making it likely that they are for nitric oxide 
(NO) rather than oxygen signaling [34]. Both Dd and Ac 
are responsive to light, although the photoreceptor that 
mediates phototaxis in Dictyostelium has yet to be identi- 
fied [35]. We identified two rhodopsins both with 



Clarke et al. Genome Biology 2013, 14:R1 1 
http://genomebiology.eom/content/1 4/2/R1 1 



Page 4 of 14 




0% 0.5 1 
■ Archaea 



1.5 2 
Bacteria 



2.5 

■ Viruses 



B 



A N Ed Eh D 




100% 



50% 0% 




0% 



50% 100% 0% 50% 100% 

Environmental Host-associated ■ Aerobic ■ Facultative Anaerobic 



Figure 2 Predicted LGT-derived genes from Bacteria, Archaea and viruses encoded in the genomes of free-living and parasitic 
amoebae. LGT-derived genes were predicted using a pliylogenomics approacli consisting of an initial similarity-based screening using SIMAP 
[111], several filtering steps to extract amoebal proteins with prokaryotic best hits, followed by automatic calculation and manual inspection of 
phylogenetic trees using PhyloGenie and PHAT [1 12]. (a) Percentage of lineage-specific LGT candidates in each genome; the absolute number of 
LGT candidates per genome is indicated next to each bar. (b) Heat map illustrating the Bray-Curtis similarity of the taxonomic affiliation (at the 
level of classes within the domain Bacteria) of putative LGT donors, (c) Ecological classification of putative LGT donors with respect to their 
oxygen requirement and association with a host. The ecology of putative donors was extrapolated from the lifestyles of the respective closest 
extant relatives. 



carboxy-terminal histidine kinase and response regulator 
domains with homology to the sensory rhodopsins of the 
green algae that represent candidates for light sensors in 
Ac (Figure 3). 

Cellular response 

Modulation of cellular response to environmental cues is 
enacted by a diversity of protein kinases and Ac is predicted 
to encode 377, the largest number predicted to date for any 



amoebozoan (Section 4 of Additional file 1). In Ac, the 
mitogen-activated protein kinase (MAPK) kinase pathway 
has been shown to be involved in encystment [36] and its 
genome encodes homologues of both of Dd's two MAPK 
proteins, Erl<A and ErkB [37]. Phosphotyrosine (pTyr) sig- 
naling mediated through tyrosine kinases was until recently 
thought to be generally absent from the amoebozoan line- 
age [38]. This signaling capacity has been associated with 
intercellular communication, the evolutionary step towards 
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Figure 3 Phylogenetic tree of rhodopsins from Amoebozoa, algae, bacteria and fungi The tree was constructed by the neighbor-joining 
method based on the amino acid sequence of the rhodopsin domain using MEGA version 5 [1 13]. The scale bar indicates the number of 
substitutions per site. Detailed rhodopsin information is listed in Table S3.6.1 in Additional file 1. 
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multicellularity and the expansion of organismal complexity 
in metazoans [39]. pTyr is thought to depend upon a triad 
of signaling molecules; tyrosine kinase writers' (PTKs), tyro- 
sine phosphatase erasers' (PTPs) and Src homology 2 (SH2) 
'reader' domains that connect the phosphorylated ligand- 
containing domains to specify downstream signaling events 
[39]. Remarkably, the genome of Ac encodes 22 PTKs, 12 
PTPs, and 48 SH2 domain- containing proteins (Figure 4a), 



revealing a primordial yet elaborate pTyr signaling system 
in the amoebozoan lineage (Figure 4b). 

The Ac PTK domains are highly conserved in key cat- 
alytic residues, resembling dedicated PTKs found in 
metazoans (Figure S4.2.1 in Additional file 1), and are 
distinct from Dd and Eh PTKs that are more tyrosine 
kinase like (TKL) (Figure S4.2.2 in Additional file 1). Ac 
PTK homologues are present in the apusomonad 
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Thecamonas trahens and have also recently been 
described in two filasterean species, Capsaspora owczar- 
zaki and Ministeria vibrans [38]. One unusual feature of 
the pTyr machinery in Ac is the 2:1 ratio of SH2 to 
PTK domains as comparisons across opisthokonts show 



a strong correlation and co-expansion of these two 
domains with a ratio close to 1:1 (Figure 4c,d) [40]. This 
increased ratio in Ac indicates either an expansion to 
handle the cellular requirements of pTyr signaling or 
that aspects of PTK function are accomplished by TKL 
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or dual specificity kinases as appears to be the case in 
Dd [41]. We also found that Ac has fewer tyrosine resi- 
dues in its proteome in comparison to Dd, which lacks 
PTKs (Figure S4.3.1 in Additional file 1). This result is 
in line with recent analysis of metazoan genomes, sug- 
gesting increased pressure for selection against disad- 
vantageous phosphorylation of tyrosine residues in 
genomes with extensive pTyr signaling [42]. 

Domain organization and composition of pTyr com- 
ponents reveal the selective pressures for adapting pTyr 
signaling into various pathways. Seven PTKs have pre- 
dicted transmembrane domains and may function as 
receptor tyrosine kinases hinting at their potential for 
intercellular communication. The majority of PTKs in 
Ac, however, show unique domain combinations; six 
PTKs contain a sterile alpha motif (SAM) domain, 
which is found in members of the ephrin receptor 
family (Figure S4.4.3 in Additional file 1). The Ac SH2 
proteins are conserved within the pTyr binding pocket 
and resemble SH2 domains from the SOCS, RIN, CBL 
and RASA families (Figure S4.4.2 in Additional file 1); 
however, the domain composition within these proteins 
differs between those of Monosiga brevicollis and metazo- 
ans (Figure S4.4.3A in Additional file 1). Approximately 
half of the Ac SH2 proteins share domain architectures 
with Dd, including the STAT family of transcription fac- 
tors (Figure S4.4.3B in Additional file 1). The presence of 
homologous SH2 proteins in Dd coupled with the com- 
plete facility in Ac predicts an emergence of the complete 
machinery for pTyr early in the Unikont lineage. This 
finding is in contrast with models that posit a complete 
pTyr signaling machinery emerging late in the Unikont 
lineage [39] and has important implications for under- 
standing the relationship between pTyr signaling and the 
evolution of multicellularity. The lack of clear metazoan 
orthologues makes it difficult to trace the evolutionary 
paths of pTyr signaling networks [43] or to accurately 
predict the cellular functions and adaptations of pTyr in 
Ac, However, with phosphoproteomics and sequence 
analysis, insights into ancient pTyr signaling circuits may 
be revealed through future studies in Ac (Figure S4.5.1 in 
Additional file 1). 

Cell adhesion 

Ac is not known to participate in social activity yet must 
adhere to a diversity of surfaces within the soil and 
practice discrimination between self and prey during 
phagocytosis [44]. Ac shares some adhesion proteins 
with Dd (Table S5.1.1 in Additional file 1) but homolo- 
gues of the calcium-dependent, integrin-like Sib cell- 
adhesion proteins are absent. Surprisingly, Ac contains a 
number of bacterial-like integrin and hemagglutinin 
domain adhesion proteins that may improve its ability 
to attach to bacterial cells or biofilms [45]. Ac encodes 



two MAM domain-containing proteins, a domain found 
in functionally diverse receptors with roles in cell-cell 
adhesion [46]. Ac has a copy of the laminin-binding pro- 
tein (AhLBP) first identified in Acanthamoeba healyi, 
which has been shown to act as a non-integrin laminin 
binding receptor [47]. Remarkably, Ac also encodes pro- 
teins containing cell adhesion immunoglobulin domains 
(Section 5 of Additional file 1). Both show affinity to the 
I-set subfamily [48] and contain weakly predicted trans- 
membrane domains (Figure S5.1.1 in Additional file 1). 

Microbial recognition through pattern recognition 
receptors 

Ac grazes on a variety of micro fauna, which requires 
the mobilization of a set of defense responses initiated 
upon microbial recognition. In vertebrates molecular 
signatures often termed microbe-associated molecular 
patterns (MAMPs) [49] are detected by pattern-recogni- 
tion receptors (PRRs) that activate downstream tran- 
scriptional responses. As Ac practices selective feeding 
behavior we looked for the presence of predicted PRRs 
in the Ac genome (Figure 5). One of the best-studied 
MAMPs is lipopolysaccharide and discrimination 
mediated through lectin-mediated protein-carbohydrate 
interactions is an important innate immunity strategy in 
both vertebrates and invertebrates [50]. Ac contains six 
members of the bactericidal permeability-increasing pro- 
tein (BPI)/lipopolysaccharide-binding protein (LBP) 
family and two peptidoglycan binding proteins (Figure 5; 
Section 6 of Additional file 1). Ac also encodes a mem- 
brane bound homologue of an MD-2-related protein 
that, in vertebrate immunity, has been implicated in 
opsonophagocytosis of Gram-negative bacteria through 
its interactions with lipopolysaccharide [51]. 

Receptor-mediated endocytosis of Legionella pneumo- 
phila in Ac is mediated by the c-type lectin mannose 
binding protein (MBP) [52]. MBP also represents the 
principal virulence factor in pathogenic Acanthamoebae 
[53]. In addition to MBP, the Ac genome encodes two 
paralogues of MBP with similarity to the amino-terminal 
region of the protein. Rhamnose-binding lectins serve a 
variety of functions in invertebrates, one of which is their 
role as germline-encoded PRRs in innate immunity [54]. 
They are absent from other Amoebozoa, although Ac 
encodes 11 D-galactoside/L-rhamnose binding (SUEL) 
lectin domain-containing proteins. Approximately half of 
the SUEL lectin domain proteins harbour epidermal 
growth factor domains, a combination reminiscent of 
the selectin family of adhesion proteins found exclusively 
in vertebrates [55]. An L-rhamnose synthesis pathway 
thought to contribute to biosynthesis of the lipopolysac- 
charide-like outer layer of the virus particle has recently 
been identified in Mimivirus that may facilitate its uptake 
by Ac [56,57]. Ac also encodes a protein where multiple 
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copies of H-type lectin are joined with an inhibitor of 
apoptosis domain. The H-lectin domain is predicted to 
bind to N-acetylgalactosamine (GalNAc) and is found in 
Dictyostelium discoidin I & II [58] and other inverte- 
brates where it plays a role in antibacterial defense [59]. 
In the brown algae Ectocarpus leucine-rich repeat (LRR) 
containing GTPases of the ROCO family and NB-ARC- 
TPR proteins have been proposed to represent PRRs that 
are involved in immune response [60]. Ac encodes a NB- 
ARC-TPR homologue with a disease resistance domain 
(IPR000767) and an LRR-ROCO GTPase. 

Antimicrobial defense 

Ac encodes proteins with potential roles in antiviral 
defense including homologues of NCLDV major capsid 
proteins [61] as well as homologues of Dicer and Piwi, 
both of which have been implicated in RNA-mediated 
antiviral silencing [62]. Our data also illustrate early evolu- 
tion of a number of interferon-inducible innate immunity 
proteins absent from other sequenced Amoebozoa. These 
include a homologue of the interferon-y-inducible lysoso- 
mal thiol reductase enzyme (GILT), an important host fac- 
tor targeted by Listeria monocytogenes during infection in 
macrophages [63]. In addition, Ac encodes two interferon- 
inducible GTPase homologues, which in vertebrates pro- 
mote cell-autonomous immunity to vacuolar bacteria. 



including Mycobacteria and Legionella species [64]. Ac 
also contains a natural resistance-associated macrophage 
protein (NRAMP) homologue, which has been implicated 
in protection against L. pneumophila and Mycobacterium 
avium infection in both macrophages and Dd [65]. 

Metabolism 

Ac has traditionally been considered to be an obligate 
aerobe, although the recent identification of the oxygen- 
labile enzymes pyruvate:ferredoxin oxidoreductase and 
FeFe-hydrogenase perhaps pointed towards a cryptic capa- 
city for anaerobic ATP production [66]. Predictions for 
nitrite and fumarate reduction, hydrogen fermentation, 
together with a likely mechanism for acetate synthesis, 
coupled to ATP production indicate a considerable capa- 
city for anaerobic ATP generation. This clearly sets Ac 
apart from Dd, which hunts within the aerobic leaf litter, 
but provides parallels with Ng, the alga Chlamydomonas 
reinhardtii and other soil-dwelling protists that are likely 
to experience considerable variation in local oxygen ten- 
sions [67]. These protists achieve their flexible, facultative 
anaerobic metabolism, however, using different pathways 
(Figure S7.1 in Additional file 1). In addition, the classic 
anaerobic twists on glycolysis provided by pyrophosphate - 
dependent phosphofructokinase and pyruvate phosphate 
dikinase [68] are absent from Ac, This suggests that 
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although multiple pathways are available for oxidation of 
NADH to NAD+ in the absence of oxygen, including a 
capacity for anaerobic respiration in the presence of nitrite 
(NO2'), a shift to a more ATP-sparing form of glycolysis is 
not necessary under low oxygen-tension. Given genome- 
led predictions of facultative anaerobic ATP metabolism, 
as well as extensive use of receptors and signaling path- 
ways classically associated with animal biology, we also 
considered the possibility of a hypoxia-inducible factor 
(HIF)-dependent system for oxygen sensing, similar to 
that seen across the animal kingdom, including the simple 
animal Trichoplax adhaerens [69,70]. However, despite 
conservation of a Skpl/HIFa -related prolyl hydroxylase in 
Ac, we found no genes encoding proteins with the typical 
domain architecture of animal HIFa or HIpp. Currently, 
therefore, HIF-dependent oxygen sensing remains 
restricted to metazoan lineages. 

Ac also retains biosynthetic pathways involved in ana- 
bolic metabolism that are absent in Dd (for example, the 
shikimic acid pathway and a classic type I pathway for 
fatty acid biosynthesis; Table S7.1 in Additional file 1), 
although investment in extensive polyketide biosynthesis 
[71] is not evident. An autophagy pathway, as defined by 
genetic studies of yeast, Dd and other organisms [72], is 
present in Ac with little paralogue expansion or loss of 
known autophagy-related (ATG) genes evident (Figure 7.2 
in Additional file 1) and likely contributes to both intracel- 
lular re-modeling in response to environmental cues and 
the interaction with phagocytosed microbes. 

Transcription factors 

Ac shares a broadly comparable repertoire of transcription 
factors with Dd excepting a number of lineage-specific 
expansions (Table S8.1 in Additional file 1). Ac encodes 22 
zinc cluster transcription factors compared to the 3 in Dd 
(Figure S8.2.1 in Additional file 1) [73]. It has almost dou- 
ble the number of predicted homeobox genes (25) com- 
pared to the 13 in Dd [74]. Two are of the MEIS and PBC 
class respectively, with an expansion in a homologue of 
Wariai, a regulator of anterior-posterior patterning in 
Dictyostelium [75] comprising most of the additional 
members (Figure S8.3.2 in Additional file 1). Strikingly, we 
also identified 22 Regulatory factor x {RFX) genes, the first 
identified in an Amoebozoan [76]. The Ac RFX repertoire 
is the earliest branching yet identified and forms an out- 
group to other known RFX genes (Section 8 of Additional 
file 1). Ac has been proposed to affect plant root branching 
in the rhizosphere via its effects on auxin balance in plants 
[77]. It encodes a number of genes involved in auxin 
biosynthesis as well as those involved in free auxin 
(indole-3-acetic acid (lAA)) de-activation via formation of 
lAA conjugates (Table S9.1 in Additional file 1). These 
data suggest that Ac plays a role in altering the level of 
lAA in the rhizosphere through a strategy of alternative 



biosynthesis and sequestration. Ac may also respond tran- 
scriptionally to auxin as it encodes a member of the cal- 
modulin-binding transcription activator (CAMTA) family 
(Figure S8.4.1 in Additional file 1), which in plants co- 
ordinate stress responses via effects on auxin signaling 
[78,79]. 

Conclusions 

Comparative genomics of the Amoebozoa has until now 
been restricted to comparisons between the multicellular 
dictyostelids and the obligate parasite Eh [80,81]. Ac, 
while sharing many of their features, enriches the reper- 
toire of amoebozoan genomes in a number of important 
areas, including signaling and pattern recognition. LGT 
has significantly contributed to both the genome and 
transcriptome of Ac whose accessory genome shares 
unexpected similarities with a phylogenetically distant 
amoeba. The presence of prokaryotic TEs in Ac illus- 
trates its role in the evolution of some of the earth's 
most unusual organisms [82] as well a number of 
important human pathogens [7,8] [83]. 

Ac has adopted bacterial-like adhesion proteins to facili- 
tate adherence to biofilms and H-NOX based nitric oxide 
signaling which likely aids in their dispersal [84] . Overall 
the adaptive value conferred by LGT is highlighted by the 
expression of the large majority in Ac across multiple con- 
ditions, which points to their adoption into novel tran- 
scriptional networks. Given the feeding behavior of Ac, it 
seems plausible that eukaryote-to-eukaryote gene transfers 
may also have provided adaptive benefits [23] . Increased 
sampling will be necessary to establish the extent to which 
such gene transfers made their way into the Ac genome 
and whether you are what you eat' equally applies to a 
diet of eukaryotes [23]. 

Ac participates in a myriad of as yet unexplored interac- 
tions, as reflected in the diversity of genes devoted to sen- 
sory perception and signal transduction of extracellular 
stimuli. Acs survival in the rhizosphere is likely contingent 
on interactions not only with other microbes but also on a 
cross-talk with plant roots through manipulation of the 
levels of the plant hormone auxin. LGT may also have 
provided Ac with some of its recognition and environmen- 
tal sensing components. An interesting parallel is the 
planktonic protozoan Oxyrrhis marina, which utilizes 
both MBP and LGT-derived sensory rhodopsins, to enable 
selective feeding behavior through prey detection and 
biorecognition [85]. We predict that host response oi Ac 
to pathogens and symbionts is likely modulated via a 
diversity of predicted PRRs that act in an analogous man- 
ner to effectors of innate immunity in higher organisms. 
Given the close association of Ac with a number of impor- 
tant intracellular pathogens, it will be interesting to deter- 
mine which host-pathogen interactions can trace their 
origins to encounters with primitive cells such as Ac, 
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Ac shares protein family expansions in signal trans- 
duction with other Amoebozoa while introducing new 
components based on novel domain architectures 
(nucleotidyl cyclases) [86]. The presence of the complete 
pTyr signaling toolkit especially when contrasted with 
its absence in the multicellular dictyostelids is a remark- 
able finding of the Ac genome analysis. However the 
role of tyrosine kinase signaling in both amoebozoan 
and mammalian phagocytosis [87-89] indicates that it 
likely represents an ancestral function. The most parsi- 
monious interpretation predicts the supplanting of func- 
tions originally carried out by tyrosine kinases by other 
kinases in the Amoebozoa. This emphasizes the impor- 
tance of representative sampling and in its absence the 
inherent difficulties in re-constructing ancestral signal- 
ing capacities. 

Transcriptional response networks can be re-pro- 
grammed either through expansion of transcription factors 
or their target genes [90]. Ac and Dd share a conserved 
core of transcription factors with any differences between 
them largely accounted for by lineage-specific amplifica- 
tions. These may result in sub- or neo-functionalization 
contributing to the adaptive radiation of Acanthamoebae 
into new ecological niches. 

Comparison of Ac with Dd highlights a broadly similar 
apparatus for environmental sensing and cell-cell com- 
munication and implies that the molecular elements 
underpinning the transition to a multicellular lifestyle 
may be widespread. Such transitions would likely have 
involved co-option of ancestral functions into multicellu- 
lar programs and have occurred multiple times. Our ana- 
lysis suggests that many signal processing and regulatory 
modules of higher animals and plants likely have deep 
origins and are balanced with subsequent losses in cer- 
tain lineages including tyrosine kinases in fungi, plants 
and many protists. 

The availability of an Ac genome offers the first opportu- 
nity to initiate functional genomics in this important con- 
stituent of a variety of ecosystems and should foster a 
better understanding of the amoebic lifestyle. Utilizing the 
genome as a basis for unraveling the molecular interac- 
tions between Ac and a variety of human pathogens will 
provide a platform for understanding the contributions of 
environmental hosts to the evolution of virulence. 

Materials and methods 

DNA isolation 

Ac strain Neff (ATCC 30010) was grown at 30°C with 
moderate shaking to an OD550 of approximately 1.0. Total 
nucleic acid preparations were depleted of mitochondrial 
DNA contamination via differential centrifugation of cell 
extracts [91]. High molecular weight DNA was extracted 
from nuclear pellets either on Cesium chloride-Hoechst 



33258 dye gradients as per [92] or by utilizing the Qiagen 
Genomic-tip 20/G kit (Qiagen, Hilden, Germany). 

Genomic DNA library preparation and sequencing 

All genomic DNA libraries were generated according to 
the Illumina protocol Genomic DNA Sample Prep Guide - 
Oligo Only Kit (1003492 A); sonication was substituted for 
the recommended nebulization as the method for DNA 
fragmentation utilising a Biorupter™ (Diagenode, Liege, 
Belgium). The library preparation methodology of end 
repair to create blunt ended fragments, addition of a 3'-A 
overhang for efficient adapter ligation, ligation of the adap- 
ters, and size selection of adapter ligated material was car- 
ried out using enzymes indicated in the protocol. Adapters 
and amplification primers were purchased from Illumina 
(Illumina, San Diego, CA, USA); both Single Read Adap- 
ters (FC- 102- 1003) and Paired End Adapters (catalogue 
number PE- 102- 1003) were used in library construction. 
All enzymes for library generation were purchased from 
New England Biolabs (Ipswitch, MA, USA). A limited 14- 
cycle amplification of size-selected libraries was carried 
out. To eliminate adapter-dimers, libraries were further 
sized selected on 2.5% TAE agarose gels. Purified libraries 
were quantified using a Qubit™ fluorometer (Invitrogen, 
Carlsbad, CA, USA) and a Quant-iT^M double-stranded 
DNA High- Sensitivity Assay Kit (Invitrogen). Clustering 
and sequencing of the material was carried out as per the 
manufacturer s instructions on the Illumina GAII platform 
in the UCD Conway Institute (UCD, Dublin, Ireland). 

RNA extraction and RNA.seq library preparation and 
sequencing 

For all tested conditions (Table S 1.6.1 in Additional file 1) 
except the infection series, RNA was extracted from a 
mmimum of 1 X 10^ cells using TRIzol® (Invitrogen/Life 
Technologies, Paisley, UK). For infection material the 
detailed protocol is published in [93]. Strand-specific 
RNA.seq libraries were generated from total RNA using a 
modified version of [94] which is detailed in [93]. Briefly, 
total RNA was poly(A) selected, fragmented, reverse tran- 
scribed and second strand cDNA marked with the addi- 
tion of dUTP. Standard Illumina methodology was 
followed - end-repair, A-addition, adapter ligation and 
library size selection - with the exception of the use of 
'home-brew 6-nucleotide indexed' adapters as per Craig et 
al [95] . Prior to limited amplification of the libraries, the 
dUTP marked second strand was removed via Uracil 
DNA-Glycosylase (Bioline, London, UK) digestion. Final 
libraries were quantified using the High Sensitivity DNA 
Quant-iT™ assay kit and Qubit™ Fluorometer (Invitro- 
gen/Life Technologies). All sequencing was carried out in 
UCD Conway Institute on an Illumina GAII as per the 
manufacturer s instructions. 
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Sequencing and assembly 

Genome assembly was carried out using a two-step pro- 
cess. Firstly, the Illumina reads were assembled using 
the Velvet [96] short read assembler to generate a series 
of contigs. These assembled contigs were used to gener- 
ate a set of pseudo-reads 400 bp in length. These 
pseudo reads were then assembled in conjunction with 
the 454 FLX and Sanger sequences using version 2.3 of 
the GS De Novo Assembler using default parameters 
(Table SI. 1.1 in Additional file 1). The assembly con- 
tained 45.1 Mb of scaffold sequence, of which 3.4 Mb 
(7.5%) represents gaps and 75% of the genome is con- 
tained in less than 100 scaffolds. For assembly statistics 
see Table SI. 2.1 in Additional file 1. In order to deter- 
mine the coverage of the transcriptome, we aligned our 
genome assembly to a publicly available EST dataset 
from GenBank (using the entrez query acanthamoeba 
EST) AND 'Acanthamoeba castellanii' [porgn:txid5755]). 
Of the 13,784 EST sequences downloaded, 12,975 (94%) 
map over 50% of their length with an average percent 
identity of 99.2% and 12,423 (90%) map over 70% of 
their length with an average percent identity of 99.26%. 

Gene structure prediction 

Gene finding was carried out on the largest 384 scaffolds 
of the Ac assembly using an iterative approach by firstly 
generating gene models directly from RNA.seq to train a 
gene-finding algorithm using a genome annotation pipe- 
line followed by manual curation. Firstly, predicted tran- 
scripts were generated using RNA.seq data from a variety 
of conditions (Table SI. 4.1 in Additional file 1) in con- 
junction with the G.Mo.R-Se algorithm (Gene Modelling 
using RNA.seq), an approach aimed at building gene mod- 
els directly from RNA.seq data [97] running with default 
parameters. This algorithm generated 20,681 predicted 
transcripts. We then used these predicted transcripts to 
train the genefinder SNAP [98] using the MAKER genome 
annotation pipeline [99,100]. MAKER is used for the 
annotation of prokaryotic and eukaryotic genome projects. 
It identifies repeats, aligns ESTs (in this case the tran- 
scripts generated by the G.Mo.R-Se algorithm) and pro- 
teins from (nr) to a genome, produces ab-initio gene 
predictions and automatically synthesizes these data into 
gene annotations. The 17,013 gene predictions generated 
by MAKER were then manually annotated using the 
Apollo genome annotation curation tool [101,102]. Apollo 
allows the deletion of gene models, the creation of gene 
models from annotations and the editing of gene starts, 
stops, and 3' and 5' splice sites. Models were manually 
annotated examining a variety of evidence, including 
expressed sequence data and matches to protein databases 
(Section 1 of Additional file 1). Out of a total of 113,574 
exons, 32,836 are exactly covered and 64,724 are partially 



covered by transcripts and 7,193 genes have at least 50% 
of their entire lengths covered by transcript data. 

Functional annotation assignments 

Functional annotation assignments were carried out 
using a combination of automated annotation as 
described previously [103] followed by manual annota- 
tion. Briefly, gene level searches were performed against 
protein, domain and profile databases, including JCVI 
in-house non-redundant protein databases, Uniref [104], 
Pfam [105], TIGRfam HMMs [106], Prosite [107], and 
InterPro [108]. After the working gene set had been 
assigned an informative name and a function, each 
name was manually curated and changed where it was 
felt a more accurate name could be applied. Predicted 
genes were classified using Gene Ontology (GO) [109]. 
GO assignments were attributed automatically, based on 
other assignments from closely related organisms using 
Pfam2GO, a tool that allows automatic mapping of 
Pfam hits to GO assignments. 

Data access 

This whole genome shotgun project has been deposited 
at DDBJ/EMBL/GenBank under the accession 
AHJIOOOOOOOO. The version described in this paper is 
the first version, AHJIOIOOOOOO. The RNA.seq data are 
available under accessions SRA061350 and SRA061370- 
SRA061379. 

Additional material 



Additional file 1: Supplementary online material. 

Additional file 2: Supplementary material supporting the LGT 
analysis. 
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